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An Objective Method for Forecasting Solar Flares 


I. INTRODl cnoN 

This report is a continuation of an earlier study (Hirman et al, 1980) in which 
multivariate discriminant analysis (MVDA) is used in a computer program to pro¬ 
duce an objective daily solar flare forecast. The essential feature of the statistics 
package is the comparison between a number of input parameters and a number of 
output classes, in which the discrimination between the classes in terms of the 
input parameters is majcimized by constructing appropriate classification functions. 
In the application to flare prediction, the input parameters are daily solar param¬ 
eters for each active region on the solar disk, and the output classes are the levels 
of flare activity occurring the following day within the same active regions. We 
have used more than two years of data, of which approximately 25 percent has 
been used to derive the classification functions. The latter are then extrapolated 
forward in time to produce a true forecast. 

The computer program, known as MMD07M, was originally written at TCLA, ^ 
although the particular version used here was developed further by Seagraves^ to 

Heceived for publication 3 Feb 1981 

1. Dixon, W, J. (ed.) (1968) Biomedic al Computer P rograms, I’niversity of 

California Publications in Automatic Computations, No. 2, I'niversity of 
California Press, p, 214a, 

2, Seagraves, P, H. (1972) UBC BMD07M Stepwise Discriminant Analysis, 

1 nive rsity o f British Columbia Computing C ^tre Docunientation. 







includp tlio c’oolpy ain.1 Lohiips clussiricatiiMi proopclurr', ' anci thr- I .:tc!ipnl)i’urli 
N-l tochiunuo, ^ Thr (‘otiloy and l.ohnos [)ri>c'odurp cioos not aasuino unit'ortviitx 
o{ variance, aiul this scMuet lines I'esults in hotter cl a:-.'.ificMt ien scenes. 'I'lie 
computational htiialen, ‘unvever, is iticreased becau.-e line: i‘ e hi'>sificat itai 
functions are not possible; insti'ad, canonical varialde.-., censtraetnd from the 
oi’ii^inal input j^arameters, are used as a transfornuition t ‘ reduce the m.atri:-. 
dimension in tiie c]:i>>ificat ion formulas. I'hc' (aicfienfirncli t^a’hni<iue roi:'.c>\ n.^ 
bias when tlie p roe ram classifies its own data base, 

.\ complete description of the matitematios is bevond the scop#^ of this repurl, 
I’he reacier may consult Anderson anil Hao^’ for refpi'nnce:-, on disciaminant 
analysis. A discussion i.if the suitability of applying various st.distical n-etliods to 
discrete input varialiles is cemtained in \'ecchia et al, ‘ hfie latter point is of 
oarticulai' init'rest iiecause tlie work of \ ecchia et al uses the same discrete data 
liaso as useti herein, to produce solar flare prcbalnlity forecasts usine (iiscriniin- 
ant analvsis (without the I'ooley and kofines procedure) anci foeistic reeression 
ana!\ SIS, 

An important feature of the pi'esent study is the comparison <u' the oiyiective, 
computer forecast with a subjective, conventional forecast nreparecl during the 
same test period for tlie same active regions on the sun, \\ ithi>ut such a bench¬ 
mark for relative evaluation, the presentation of any forecast method has con- 
sitiorabiv retluci'd merit, 

2. DM V 


Tlie data used herein were obtained from the reirion analysis program at tlie 
a'OAA Space Environment Services Center (SESC) in l^oulder, Colorado. The 
rp£jion analysis program collects daily a variety of solar parameters for each 
active re^^ion on the solar disk. It is important to note that there is no attempt 

d, t'ooley, W, W. , and f.ohnes, P. H. (19G2) Multivariate Procedures for the 
Kehavioral Sciences, Wiley, New York, 

4, T.achenbruch, P, A., and Mickey, Al. R. (1968) Technometrics, 10:1. 

d, Anderson, T. W, (1958) An Intro duction to Al^iRtivariate Statisti cal An alysis, 
Wiley, New York. 

6. i^ao, C, H, (1974) Advanced Statistical Methods in i^iometric Research, 

Hafner. 

7. Vecchia, D. F,, Caldwell, G. A., Tryon, P. V., and Jones, H. H. (1980) 

in Sol. “Terres. Pred. Proc^, Vol, 3, R. F. Donnelly (ed.), C-76. 
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in thiii program to ^iolect tho inoro flai’o-pi’oductivr I’ogions. 'J'hn paran^rters 
include radio and \-ray ilata, but nH)st ai*o derived from optical data supplied 
by the I'SA !•'/AU'S SOON dy^item. IdTe parametejV's contain infoinnation which llie 
SESC forecasters consider vital to the preparation of a 24-hour flare forecast, 
'Phe present study uses data foi' tlie period 1 January 1977 to Jl January 1979, 
containinp^ 9099 active-region days (records) that have been checked for errors 
and internal consistency. Ilandom scrutiny, however, lias shown tliat errors 
still remain. After several reassignments of parameter values and definitions, 
we arrived at the form of the data base shown in 'fable 1, 


'fable 1. SESC Region Analysis Parameters (Modified) 


PARAMETER 

1. DATE 

2. REGION NCMRER 

3. REGION'S ITRST APPEARANCE EONGITCDK 

4. CURRENT LONGITUDE 

5. N/S LATITUDE 

6. CURRENT LATITUDE 

7. CARRINGTON LONGITUDE 

8. REGION AGE 

9. SPOT CLASS 1 

A. 

n . 

c. 

D. 

E. 

F. 


ASSIGNED VALUE 


1 


2 


4 


G 

7 


H 


3 















ral)Io 1. SFSC' Analysis I’aramntPrs (Modifird) (continuod) 

10. SPOT ('PASS 2 


r/x. 1 


a. 3 

h.4 

k... r> 


11. SPOT CLASS 3 


X. 1 

o. 2 

i . 3 

c ..4 

12. MAGNETIC CLASS 

No spots...0 

Alpha. 1 

Beta. 2 

Bet a-Gamma. 3 

Gamma... 4 

Beta-Delta. 5 

Beta-Gamma-Delta. 6 

Gamma-Delta. 7 

13. MAGNETIC POLARITY OF STRONGEST FIELD.. {+/-) 

14. MAGNETIC FIELD STRENGTFI. (Gauss) 

15. MAGNETIC GRADIENTS. (Gamma/km) 

16. INTERACTION WITH ANOTHER REGION 

None .... 0 

Spots of opposite polarity converge (from less 
than two degrees apart). 1 

17. SUNS POT DY N A MICS 

No spots or no motion... 0 

Coalescing of spots... 1 































Table 1. SESC Region Analy.sis Parameters (Modified) (continued) 


































'I'ablp 1. SESC' Hogion Analysis Paraniotors (Modifird) (continured) 


PEAC.E C'OIMPACTNESS 

Non-conipact. 

Compact. 

NEITICAL LINE ORIENTATION 

Weak structure. 

North-soutii (i 4 5 deg). 

East-west {± 4 5 deg).. 

Hairpin. 

Circular... 

RE\ EHSE PCH.AHrrV 

Normal polarity. 

Reverse polai’ity.. 

NEl THAT.. LINK COMPLEXITY 

Straight lino or weak structure , . , , 

1-S Kinks. 

4-G. 

7-lJ. 

-1 '. 

NEl 'rUAl. LINK CHIANCKS 

No trend . 

Becoming simple. 

Becoming complex . , ,, .. 

BHICHT POINd^S 

None..... . 

Occurred, but not along neutral line 
Occurred along neutral line 
PLAGE FI.rCTCATIONS 

None. 


0 

1 

0 

1 

5 

4 

0 

1 

0 

1 

•) 

'S 

4 

0 

1 

2 

0 

1 

2 

0 


Occurred 


1 































Table 1. SESC Region Analysis Parameters (Modified) (continued) 


31. ISOLATED POLE 

32. EMERGING FT.l’X 

None, or region is new...0 

New flux emerges within spot group... 1 

New flux emerges near region (within '> deg). 2 

33. ARCH FILAMENT SYSTEM 

34. RADIO BURST/SWEEP 

None occurred. 0 

>250 flux units at 10 cm. 1 

>1000 flux units at 10 cm. 2 

Type III. 3 

Type IV. 4 

Type II and IV... 5 

I l^urst.6 

Major/complex 10 cm burst ...... .. 7 

>1000 flux units at 10 cm plus a U burst, or 
Type III and I\N or 

250 flux units at 10 cm plus Type III and IV.8 

35. REGION'S FIRST APPEARANCE (TRANSIT HISTORY) 

36. FLARE HISTORY 

No flares have occurred....0 

C class flares have occurred. 1 

M class flares have occurred .. 2 

X class flares have occurred ..3 

37. FLARES TODAY 

None. 0 

C class . 1 

M class. 2 


X class 





























SB. iM;( y\\)S nis'i’( 


\otif* occurroci.0 j 

Proton ovont occurred. 1 | 

eJround level event. 2 j 

SO. piurroNs 'in>i)Av I 

None.I) ; 

c)ccui*i-('d.. . . I : 

40. I^FCSION I-(uHKi'AS I'S (SKSCA i 

I 

Probabilities lor each class (S' Hare (none, c:, \i, .n* \) lor ' 

each re^'ion, I'or the J4-hour fieriod beLUnniUL’ at 0 hv I 1' next ' 

day. Proton event pn.tbabilitie.'^ are similar Iv .'taied. i 


Most ot the parameters in i'aOie i !iave ueen assiimen discrete values 
accorciinc to categories whic.h are subiectivelv related le increasing Hare 
activitv, i’his subjectivity is the weakest link in an\ sciieaie utilizing omective 
procedures tor prouucint? a forecast sobMv from data, in essence, the situa¬ 
tion merely allows the element of subjectivity to reside entirely in t!ie nata 
acquisition process. i^roi:)ably, this situation is prefei'aole to having subjecti¬ 
vity introuuced also in the forecast preparation, d'here ar^ se\'eral paran.eters 
(o. lT. spot class, flare liistory, magnetic class) for whicli assigned values are 
oasod upon quantitative studies, h'ortunately, (or perhaps ther^forei ) these paran 
eters are among those from which the objective forecast derives most of its skill. 

Perhaps the most unfortunate circumstance is that for a large number of 
records one or more parameters is missing. In the computer program, mis¬ 
sing data codes are replaced by averages for the particular parameter in the 
set of records used in deriving the classification functions. Missing data, in 
addition to errors, makes the testing of objective techniques difficult, espec¬ 
ially for determining the relative significance of various parameters. In 
order to portray some feeling for the degree of representation in the data base 
we note the following: for three commonly observed parameters. Spot Class 
2, ' "Magnetic Class, " and "Flares Today, " only 5893 of the total 6095 records 
contain all three; if "Bright Points, " "Spot Class 3, " "Spot Class 1, " "Magnetic 
Gradients, " and "Sunspot Dynamics" are added to the first three, only 3732 
records remain; and for a total of 15 of the 31 usable parameters, only 510 
records contain all 15. This is, indeed, a hardship for statistical analysis. 
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Novortliolpsrf, wo aro a bio to .show I at or that at I oast sonu* ol* thoso fro(}uontly 
missing paramotors contain valuablo prodictivo information, 

Tho data base contains daily, region-by-rogion ontrios for tho actual flaro 
activity, in addition to tho official SKSC’ sul>joctivoly dorivod flaro foroca.st, 
d'lius, tho information required for objective forecast testing, as well as for 
comparison with tho SESC! forecast, is contained in the same base, I lares are 
listed according to their peak soft (1-11 ;,) \-ray flux at 1 Al : 


CUass C’: 

10'^’ < E < 

10“'’ 

watt 

_ ') 

m 

Class Itl I 

lO’’ < K < 

lO"*^ 

watt 

m 

class X■ 



watt 

_ ■) 

m 


f’rom tho standpoint of geophysical environment studies, the classes and X 
are of greatest importance. 

In addition to Table 1, six combination parameters (Table 2), derived from 
certain original parameters, were included as input parameters. 'I'hese six were 


Table 2. Combination Parameters (Numbers in right-hand 
column refer to original parameter number in Table 1) 



fovind to have possible predictive significance in the earlier study where twenty 

g 

such combination parameters were tested. The derivation of combination param¬ 
eters is based on intuitions about the form in which predictive information might 
be contained in the data, and about physical quantities (e. g., energy stored in 
sheared magnetic fields) presumed relatable to flares. The subject of these and 
other combination parameters will be discussed in a later section. 

8. Hirman, J. W., Neidig, D. F., Seagraves, P. H., Flowers, \V. E., and 

Wiborg, P. H. (1980) in Sol.-Terrest, Pred, Proc., Vol. 3, R.F. Donnellj" 
(ed.), C-64. — - _ 
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'riir ro^ion aiialy.sis [>aramoter.s for today are independent of any information 
on Hare activity occurring tomorrow; therefore, they can he used in practice, 
today, to produce a flare forecast for tomorrow, assumin^^ that predictive infr)rma- 
tion is present in the parameters. We have used the first N records (with N = 1500, 
as t-iescrihed lielow) as a training set ' in order to derive the classification func- 
tions for tfiree possihlo outcomes: "No i-'lare, ’ Mare, " and M or X l-']are. ' 

M and \ Hares '.vei'e grouped to^yether as a single class in order to reduce- statisti¬ 
cal noise caused by the relatively few cases of larger flares. 5die classification 
functions were then applied to new records, using only the input parametei\-, in 
order to produce a true forecast, 'The latter procedure was accomplisheci in 
steps of 250 records each, with the training set sliding fc-rward in tinie, t!5U 
records (ap|oroximatelv one month) after each step, 'ffius, mr a 1500-record 
ti-aining set, the remaining -1 5-record test set requires 10 individual su'oiest.-:> 

of 250 records each (except for tiie nineteenth), fhis sliding base technique main¬ 
tains a constant \ records in the training set, thereby assuring that the progran. 
is trained on recent data relative to the test subset. Tins, combined with the 
relatively small size of the test subset, mininiizes the effects of secular trends, 
either of obsorvationai or solar origin, which migfit be present in the data. 

r.he computei’ progugn-jj was trained oii the X-ray class of the largest event 
(No I'dare, i' I'd are, or M \ \ b'lare) occurring in the region in the 24-hour 
period following the acquisition date of the input parameters. 'J'hus, the computer 
forecast is exjmessed in terim^ of prol^abilities for the largest event to be in one 
of tiiese classes. The outcomes are mutually exclusive, with the sum of probabili¬ 
ties over all classes equal to unity, ’fhe SESC forecast, however, is a probability 
forecast for the occurrence of each class of event; i.e,, a non-exclusive formaU 
In order to assess the quality of the computer forecast, we derived a comparison 
forecast in the "exclusive" format hy selecting the largest event class in the 
SESf forecast that was assigned a probability greater than or equal to 0. 5, Al¬ 
though this is not an SESC forecast, it is probably representative of what would 
be extant if the SESC chose to cast their predictions in this mode. 

In the following test results we present the forecasts according to both the 
standard multivariate discriminant analysis (MVDA) and the Cooley and l.ohnes 
procedure (MVDA/CL). There are important differences in the character of 
these two forecasts, which, as will be shown later, may be used to advantage. 
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4.1 Kri4imiiiar\ DisnissitMi 

\\> iiro coacprned mainly with tho behavior of the computer forecast.^ i-ela- 
tive to the comparison forecast when, for example, changes are made in the 
si7(^ of tlie training set, choice of input parameters, solar activity levels, and 
percent of missin^^ data. In all cases we present the computer forecast alon^» 
with tfie comparison forvcast for the same set of test records. Also included is 
a list t>f input paranieters submitted to analysis, alon^ witli their fre(mency of 
self'ction in classifymc the thr(''e outcomes, .\ot(', however, tliat ilue ti’ t fie 250- 
reconl incr-eii'.ent the trminin^ sets are independent of each oihfM' only when sepa¬ 
rated by .-IX oi' more sufisfns, 

A.- a first -tf'p, we eliminated 11 parametei’s which wei'e luU selectevi in anv 
of tfie la subs('‘!s, I'ollowinc tins, tlie program was run auain usinc the rcmain-- 
in^ input parametei's. The results are eiven in I'ablr-s 5, -4, and a, I’his 

test (\) will .-ei've a.'r an example for the display.- used els<‘wher** in tin- la'P'^'i’t. 

Table -Iiow.- tile actual matrix of I’ei^ion-day forecast.- v.- r< ci'^'n-o'ay 
!ar<^e-t events, fen' tfie tfiree forecasts. Table 5, deriveci I'rcnn the data in Table 
.'■1, .-imimariae-. tfie following'; 

1’’ I'ercent of forecasts correct in the i:tiven e\-cnt das.- 
F" 1‘ercent i>f region-day largest events which were forec:isted 
\ (V K)/J 

t’ riimatology (percent of the total number of events in the class) 

I I n weighted mean of the A's fen’ all three-event classes 
W eighted mean femecast accuracy (the sum of the matrix 

(iiagonal elements divided i)v the total niii'nber of forecasts, 
er events, in all classes) 

( U'f 1 Percent of forecasts ifiat are one matrix element away from 
the diagiMial 

t d‘f 2 percent of foi’ecasts that are two matrix elements away from 
the diagonal 

These varii^iis scores are of interest because of the several ways in which 
forecasts can he used, h'or example, the T' scoi'e, f>r pei'centage of forecasts 
that are correct, is the quantity of interest to a customer who cannot tolerate 
false alarms, A quite different requirement applies, however, in a situation 
where surprise flares are unwelcome. In the latter case, the K score is the 
important measurement. Of course, knowing the customer's need in advance 
allows the forecast to be biased either toward underprediction, wfiich tends to 
improve the I*‘ score, or toward overprediction, which impi’oves the K score. 
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3. Comparison of I‘'orocasts--Tost A 
(1500-rpcord training sot) 


Largest Event 

Largest Event Observed 

Total 


I'orecasted 

No I’ lare 

C' 

M-iX 

Forecasts 


C'OMPAinSON 






No Flare 

337G 

213 

25 

3614 


C' 

501 

199 

59 

7 59 



77 

82 

63 

2 22 


'fotal Events 

3954 

494 

147 

4 595 


MVOA 






No Flare 

3349 

190 

26 

3 565 



513 

20G 

51 

770 



92 

98 

70 

260 


Total Events 

3954 

494 

147 

4ri9r) 


MVDA/CL 






No Flare 

3739 

316 

50 

4105 


C 

185 

142 

50 

377 


M/^tX 

30 

36 

47 

113 


Total Events 

3954 

4 94 

147 

4595 



As a moasure of tho 'balanced" accuracy of a forecast in a given event class 
we, therefore, introduce the average of I-' and E, given by A, 

The accuracy of a forecast is always dependent upon the climatology for 
the event being forecasted. Higher climatological probabilities tend to improve 
the chances for predictions to be correct. For example, it is easy to predict 
"No Flare" with 90 percent accuracy, simply because no flare occurs in almost 
90 percent of all active-region days. In comparing cumulative scores between 
forecasts it is imperative to note the climatology which prevailed during the 
test period. Climatology is affected by a number of factors, including event 
classification criteria, duration of forecast interval, and level of solar activity. 
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Table 4, Parameters Submitted to Analysis and 
Their Frequency of Selection in 19 Subsets--Test A 


Flares Today 

19 

New No. 1 

9 

Mag. Pol. 

5 

Bright Points 

19 

Mag. Grad. 

9 

Neut. F. Chg. 

5 

New No. 2 

17 

Mag. Class 

6 

Spot Class 3 

! 

Spot Dynam. 

12 

Badio !i/S 

n 

Spot Class 2 

3 ! 

New No, a 

12 

Flare Hist. 

G 

Spot Inter. 

■■5 ! 

Proton Hist. 

11 

N e vv N o. 

G 

Emerg. i-'lux 

1 

1 ! 

Spot Class 1 

9 

New No. 4 

6 


j 


Table f). Comparison of [-'orecast Scores^-Ost \ 


f'N^reoaster 

_ ^ _ 

Event 

F 

E 

A 

C 

I 

W 

( 'ff 1 

— =1 

( 'ff 2 

COMPAHISCN 

No i'lare 

93.4 

85.4 

89.4 

86. 1 







2G. 2 

40. 3 

33. 3 

10. 8 

52.8 

79, 2 

18. G 

2. 2 


MNX 

28. 4 

42. 9 

3 5. G 

3, 2 





MVD A 

No {‘'hire 

03. 9 

34.7 

89. 3 

86, 1 






C 

26.8 

41.7 

34. 3 

10 . s 

53. 6 

78. 9 

18.4 

2, 6 i 


MN \ 

2G, 9 

47. G 

37. 3 

3. 2 




i 

t 

MVDA/C’I, 

No l‘lare 

91. 1 

94, 6 

92.8 

8G. 1 






C 

37.7 

28. 7 

33.2 

10.8 

:54. 3 

85. 5 

12.8 

1.7 


M.<\ 

41.6 

32. 0 

36. 8 

3. 2 




1 

.. J 
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In essence, climatolog>^ is directly dependent upon "bin si?:e. " I'ailure to stale 
climatological conditions clearly (an unfortunately common practice) makes 

9 

intercomparison of forecasts almost impossible* It seems that this point can¬ 
not be emphasized enough. 

liecause "No Flare" constitutes the majority of situations on the sun, it comes 
as no surprise that solar flare forecasts are usually quite accurate overall; i.e,, 
their weighted means (\V) are high. It is of greater interest, however, to predict 
flares than quiet conditions and, for this reason, the unweighted score 1, given 
simply by the mean of the A scores over all classes, has been included in Table 5, 

Finally, we note that if a forecast is in error, it is better to be wrong by one 
event class than by two. Thus, the tendency for the off-diagonal entries in the 
matrix to cluster near the diagonal is an important measure when comparing fore¬ 
cast scores which are similar otherwise. Table 5 includes a measure of this 
error distribution in the form of the Off 1 and Off 2 scores. 

The scores (F, E, and A) have uncertainties of approximately ±1, ±3, and 
±5 for No Flare, C Flare, and M & X Flare, respectively. The V and \V scores 
have uncertainties of about ±1, Thus, in terms of A and F, the three forecasts 
in Table 5 are essentially identical. The MVDA/CL forecast definitely excels in 
the W score, although this is mainly due to its tendency for underprediction, which 
places a large number of forecasts in the No Flare column. The tendency for 
underprediction in the MVDA/CL forecast is evident also in the F scores for C, 
and M & X flares, being significantly higher than the corresponding E scores. 

On the other hand, both the comparison and the MVDA forecast are biased toward 
overprediction. Their overall similarity is quite striking, 

4.2 Effect of Training Set Size 

The number of records to be used in the training set should be large enough 
to provide sufficient statistics to train the computer program, yet small enough 
to avoid the effects of trends in the data. The optimum number, while not known 
from theory, may be determined empirically by varying the training set size and 
comparing the scores of the resulting forecasts. Table 6 shows the results for 
training sets of 750 and 2095 records. Together with Table 5 (1500-record train¬ 
ing set) we find differences of only small significance, A close examination of 


9. Simon, P., Smith, J. B,, Ding, Y,, Flowers, W,, Guo, Q,, Harvey, 

K. L., Hedeman, R., Martin, S. F., McKenna Lawlor, S., Lin, V., 
Neidig, D,, Obridko, V. N., Dodson Prince, H,, Rust, D,, Speich, D,, 
Starr, A,, and Stepanyan, N, N. (1980) in Sol,-Terres. Pred. Proc., 
Vol. 2, R. F. Donnelly (ed.), p. 287, 
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Table 6. Comparison of Scores Using TnO-Hecord 



and 2095- 

Record Training Sets- 

-Test 

and (■ 




Forecaster 

Event 

F 

E 

A 

1 

\\ 

Off 1 

off 2 

MVDA 

No Flare 

93. 9 

86.0 

89. 9 





750 Records 

C 

28.4 

41.3 

34. 8 

53. 3 

80.0 

17.0 

3.0 


M ^ X 

25. 2 

45. 1 

35. 1 





mvda/cl 

No Flare 

91. 7 

93. 7 

92. 7 





750 Records 

C 

36.2 

32.7 

34,4 

52.8 

85. 1 

13. 1 

1.8 


M & X 

36. 3 

26,0 

31. 2 





MVDA 

No Flare 

93. 9 

84.0 

89.0 





2095 Records 

C 

25, 9 

42.8 

34.4 

53. 6 

78. 4 

19.4 

2. 2 


M X 

28.6 

46.4 

37. 5 





mvua/ci. 

No Flare 

91.0 

95.0 

93.0 





2095 Records 

C 

38.0 

29.4 

33. 7 

54.3 

85. 8 

12.8 

1.4 


M ^ X 

45,8 

26.4 

36. 1 






the trend in the various scores^ however, suggests that there may be some 
improvement, especially in the MVDA/CL forecast, as the size of the training 
set is increased from 750 to 1500 records. The improvement is less certain in 
increasing the set from 1500 to 2095. According to motivations which will be 
described later, the E score is of interest in the case of the MVDA forecast, 
while the F score is of prime importance for MVDA/CL, Noting these, the U 
scores, and the fact that we do not wish to make the training set unnecessarily 
large, we have decided to use 1500 records in all training sets. 

4.3 Inclusion of Additional Combination Parameters 

Table 4 indicates that five of the six combination parameters from Table 2 
were retained for analysis after the initial parameter selection, Decause several 
of these ranked highly in frequency of selection in Test A, we decided to test 
additional combination parameters. As in the case of the original six, the addi¬ 
tional parameters were derived on the basis of intuition. Their formulas are 
given in Table 7, 

The 20 new combination parameters, in addition to the 20 parameters used 
in Test A, were submitted to analysis in Test D (Tables 8 and 9), It is convenient 
to defer the discussion of the latter to the following section. 






'fable 7, Additional Combination Parameter.':* (Numbers 
in ri^iit-hand column refer to original parameter numbers 
in Table 1) 


New Parameter No, 

Parameter Formula 

Rates of Change 

7 

29 (today) - 29 (yesterday) 

8 

37 - 37 

9 

(9- 10. ll- 12) - (9- 10- 11- 12) 

10 

17 - 17 

11 

(12- 17- 27) - (12- 17- 27) 

12 

00 

CO 

1 

CO 

13 

9 - 9 

14 

(9. 10. 11) - (9. 10. 11) 

15 

15-15 

16 

12 - 12 

Parameters Squared 

17 

^o 

CO 

18 

37^ 

19 

(9. 10- 11- 12)^ 

20 

17^ 

21 

9^ 

22 

(New 7)^ 

23 

(New 8)^ 

24 

(New 9)^ 

25 

(New 10)^ 

26 

(New 13)^ 
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Table 8, Parameters Submitted to Analysis and Their I'requency 
of Selection in 19 Subsets--Tests D, E, V, G, and H 


No. of 

Test Parameters 


D 

40 

Flares Today 

19 

Radio B/S 

6 

Flare Hist. 

2 



New 18 

17 

New 9 

6 

New 4 

2 



New 2 

16 

New 12 

6 

New 23 

2 



Bright Pts. 

14 

New 1 

5 

Spot Class 2 

1 



New 19 

12 

Neut, L, Chg, 

5 

Emerg. Flux 

1 



New 15 

10 

New 5 

5 

New 17 

1 



Mag* Grad, 

9 

New 14 

5 

New 19 

1 



Proton Hist, 

9 

New 21 

5 

Spot Class 1 

0 



New 3 

9 

New 22 

5 

New 11 

0 



New 8 

9 

Mag, Pol, 

4 

New 13 

0 



New 20 

8 

New 7 

4 

New 16 

0 



Mag, Class 

7 

Spot Inter. 

3 

New 26 

0 



New 10 

7 

New 25 

3 





Spot Class 3 

6 

New 20 

2 



A 

20 


See 

Table 4 




E 

15 

Flares Today 

19 

Spot Class 3 

14 

Hadio B/S 

6 



Bright Pts, 

19 

Spot Dynam. 

13 

Spot Class 1 

5 



Mag, Class 

16 

Proton Hist, 

11 

Mag. Pol. 

5 



Mag, Grad, 

16 

Flare Hist. 

10 

Emerg, Flux 

4 



Spot Class 2 

14 

Neut. L. Chg. 

6 

Spot Inter, 

3 

F 

8 

Flares Today 

19 

Spot Class 2 

16 

Spot Dynam. 

11 



Bright Pts, 

19 

Spot Class 3 

14 

Spot Class 1 

5 



Mag, Class 

17 

Mag, Grad, 

13 



G 

5 

Flares Today 

19 

Mag, Class 

18 

Spot Class 3 

15 



Bright Pts, 

19 

Spot Class 2 

16 




3 

Flares Today 

19 

Mag. Class 

19 

Spot Class 2 

19 


21 






Table 9. Effects of Reduction in the Number of Input Parameters 


Forecaster 

Number of 








Parameters 

V 

w 

Off 1 

Off 2 

R 

COMPARISON 


52.8 

79.2 

18. 6 

2. 2 

2, 22 

TEST D 

MVDA 

40 

53.8 

79. 5 

18.4 

2. 1 

2.38 


MVDA/CL 


54.6 

85. 1 

13.4 

1. 5 

0.73 

TEST A 

MVDA 

20 

53.6 

78. 9 

18.4 

2.6 

2.63 


MVDA/CL 


54.3 

85. 5 

12.8 

1.7 

0.60 

TEST E 

MVDA 

15 

52.4 

78. 1 

18. 7 

3, 2 

2.80 


MVDA/CL 


53.9 

85.4 

12.8 

1. 8 

0.65 

TEST F 

MVDA 

8 

52.2 

78.0 

18.6 

3.4 

2. 83 


MVDA/CL 


53. 3 

85.0 

13.4 

1.6 

0.71 

TEST G 

MVDA 

5 

53.0 

77.6 

18, 5 

3. 9 

3.02 


MVDA/CL 


53.7 

84.4 

14.0 

1.7 

0.83 

TEST H 

MVDA 

3 

51. 5 

78.2 

17. 5 

4. 3 

2.72 


MVDA/CL 


53. 5 

85.2 

13. 3 

1. 5 

0.61 

4.4 Reduction in the Number of Parameters 






The computer forecast was subjected to a 

series of reductions (Tests 

E, F, 


G, and H) in the number of input parameters, according to Table 8, with the 
corresponding forecast results summarized in Table 9. Table 9 displays the 
effects of parameter reduction beginning with 40 parameters and ending with 
only three. In addition to the previously used scores we introduce R, the ratio 
of the number of matrix entries below the diagonal to the number above the 
diagonal. This ratio provides a measure of the asymmetry of the forecast, with 
values greater than unity indicating overprediction, and values less than unity 
indicating underprediction. 

Table 9 clearly illustrates that the reduction in the number of parameters 
has a small but unfavorable effect on the computer forecasts. We may regard 
the tendencies for R to depart further from unity, for Off 2 to increase, and for 
V to decline, as evidence for progressively worsening forecasts. These three 
effects are most noticeable in the MVDA forecast, while the latter effect alone is 
marginally evident in MVDA/CL. 





The effects of the paranieter reduction are offset by the increase in the 
number of records containing all or most of the parameters submitted for anal¬ 
ysis in the reduced sets. This improvement in representation occurs because 
in the reduction steps we usually eliminated those parameters that were least 
significant; i.e., those chosen least often in the subsets of the previous test; 
and, generally, the lower the significance of a parameter, the more often it is 
missing from the data base. It is concluded, therefore, that the decline in fore¬ 
cast quality in fable SJ would have been more pronounced had all parameters 
been present in all records. This proves that there is valuable predictive infor¬ 
mation contained in at least some of the less significant parameters. It is 
emphasized that, perhaps to a large degree, the lower significance of these 
parameters is due only to their frequent absence from the data base. 

A final word must be noted regarding the combination parameters. Table 8 
indicates that a number of these new parameters have been selected by the com¬ 
puter program as significant in classifying the outcomes. Due to the complex 
intorcorrelations among various parameters, however, in addition to possible 
variance stabilization effects and other statistical phenomena, we do not fully 
understand the true significance of these comoination parameters. Questions 
such as this probably must await further testing on data bases containing fewer 
missing parametei’s. 

1.7) IVsts on a Ktilly Ke|irt'S4>iiled Data LUm' 

'fhe most important test of the computer forecast is acineved in the case 
v/hore all the parameters suimtitted to analysis are present in all records of the 
data base. Such a test, using the full set of parameters, is impossible with the 
presently available data. A test can oe made on a fully represented base, how¬ 
ever, if, for example, only eight parameters are used, and we are willing to 
accept a reduced base of .>7:^^ records, of which only 223li remain in the test set. 
Such a test (I) was performed, and the results are shown in Tables 10, 11, and 12. 

Test I shows a dramatic improvement in the AIVDA/CL computer forecast 
in all scores, while the M\'UA and comparison forecasts show smaller improve¬ 
ments. These improvements occur despite the somewhat lower flare climatology’ 
that applies to this particular test set. The fact that the comparison (subjec¬ 
tive) forecast scores are higher indicates that the more complete observational 
coverage during this sample of records somehow benefits the subjective methods 
also. 

Due to the reduced number of records, the errors associated with the Test I 
scores are about 50 percent higher than those stated earlier. Nevertheless, there 
now seems no question that the MVDA/CL forecast is superior to the others. 


23 






Table 10, Comparison of l^’orecasts I sing a f ully 
Hepresented Data iiase--Test I (laOO-record training set) 


Largest Event 
Forecasted 

Largest 
No Flar 

Event 
e C 

Observed 

IV I ^ X 

'fot al 

Forecasts 

COMPARISON 

No Flare 

17n4 

90 

8 

18 52 

C 

193 

70 

24 

287 

IVI & X 

28 

31 

34 

93 

Total Events 

197 5 

191 

66 

2232 

MVDA 

No Flare 

1707 

67 

10 

1784 

C 

232 

92 

24 

348 

Al & X 

36 

32 

32 

100 

Total Events 

1975 

191 

66 

2232 

MVDA/CL 

No PTare 

1829 

97 

14 

1940 

C 

145 

82 

33 

260 

M ^ X 

1 

12 

19 

32 

Total Events 

1975 

191 

66 

2232 


Table 11, Parameters Submitted to Analysis and 
Their Frequency of Selection in 9 Subsets--Test I 


Flares Today 

9 

Mag. Class 

8 

Spot Class 2 

5 

Bright Pts. 

9 

Mag. Grad. 

8 

Spot Class 1 

1 

Spot Class 3 

8 

Spot Dynam, 

8 
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('ojTiparison nf l-’oi-ocast - - I'o.-,! 1 


1 'orecaster 

Event 

!• 

K 

A 

C 

1 A. < >ff 1 

( n'i‘ 2 

c:OMPAHlSON 

No I'lare 

94,7 

88. 8 

91,8 

88. 4 


: 


C 

1^4.4 

;u;. 0 

80. 4 

8. 0 

4 4.-. 88. 2 I4.i 

1. G 


M \ \ 

u>, r; 

■) 1. 4 

44. 1 

8. 0 



.\1\ DA 

Nc> I'lare 

9.4,7 

88. 4 

91. 1 

88. 4 




(’ 

Jfi. 4 

48. 8 

'P . .} 

8. r; 

4fi. 2 82.0 i'-.O 

2,1 


M A A 

■A. 4 


40. 2 

0 




No riar-- 

' ^ 4. ■; 

b'N 4 

98. 4 

88. 8 




C' 

.;l. .) 

42. 9 

87, 2 

8. G 

48. 8 8G. 4 12.9 

0.7 


M 


28.8 

44. 1 

8.0 




ruNCJ.l >|()N> \M) HKCOMMKNDA IIONS 


i !]»■ ca)nclusio[ib oi thi.-^ study may t)p .sumniarizpci as i'oilows: 
i. i'hp .staiKiard .M\ i^.A I'orocast is \ rrv similar to the comparison lorccasl 
used in tills stU(i\ in terms ot' overall accuracy and i)ias toward overprodiction. 

ria* M\ DA/c'I.. forecast is superior overall to eitlier tfie or the 

coi!;par*son K»reca.-t, and is biased toward underprediction. 

The optimum sire for the training set is prol:)ably about lAOO records 
for the climalvih aios tint prevailed during 1977 and 1978. 

4, !■ lares lA^dav is the most valuable prediction parameter in the data 

base useci here, with tiie Ilricht Points' parameter a verv close second. 

(>tiier important ])arameters are Magnetic Class, ” 'Magnetic Cradient, Spot 
Cla.-^s, and Sun.pot Dynamics. 

"i, Com’oination parameters, althoutrli their role is not fully understood, 
seem to impress' forecast scores. 

9. Some of the often missing parameters (which probably, tlierefore, only 
appear to be less significant as predictors) contain valuable predictive informa¬ 
tion. Probal)le candidates include Radio Burst/Sweep, Neutral Pine Changes, 
"Neutral lane C’omplexity, ' and "Emerging I^^lux, " 

’I'he MVDA/CP procedure may be capable of producing forecasts superior 
to any presently available using conventional, subjective techniques. It has been 
shown that its skill becomes markedly evident when complete parameter repre- 
spntation is achievod in the data base. On the basis of this, we predict that with 







improvements in data consistency, as well as the inclusion of new, objective 
parameters in the future, the computer forecast scores will continue to improve. 


This study has led us to make the following recommendations concerning 
the use of the two computer forecasts: 

1. Provide a flare forecast derived from MVDA/CL for those customers 
who cannot tolerate false flare alarms (note the comparison of F scores in 
Table 12). 

2, Provide a flare forecast derived from standard JVIVDA for those cus¬ 
tomers who need to be forewarned of flares as often as possible (compare E 
scores in Table 12). 

3. Improve the coverage for the parameters in Table 1 that are deemed 
'less significant' by virtue of their frequent absence in the data base, 

4, Improve the objectivity and consistency of all parameters. 
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